parameter space
Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments
We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)
- Oceania > Australia > South Australia > Adelaide (0.05)
- Europe > United Kingdom > England > Surrey (0.05)
- Asia > Vietnam (0.05)
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > South Australia > Adelaide (0.04)
- Europe > United Kingdom > England > Surrey (0.04)
- Asia > Vietnam (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine (0.93)
- Information Technology (0.67)
- Health & Medicine > Health Care Technology > Medical Record (0.46)
- North America > United States > New Jersey > Essex County > Newark (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Arizona (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Workflow (0.69)
- Research Report > New Finding (0.46)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > United States > Texas > Brazos County > College Station (0.14)
- Europe > Portugal > Braga > Braga (0.05)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > Canada > Quebec > Montreal (0.04)